接下來就是 Text-to-Image 的測試了,直接執行看看。

batch_size = 4
# caption = "Cinematic photo of an anthropomorphic nerdy rodent sitting in a cafe reading a book"
caption = "Cinematic photo of an anthropomorphic penguin sitting in a cafe reading a book and having a coffee"
height, width = 1024, 1024
stage_c_latent_shape, stage_b_latent_shape = calculate_latent_sizes(height, width, batch_size=batch_size)

# Stage C Parameters
extras.sampling_configs['cfg'] = 4
extras.sampling_configs['shift'] = 2
extras.sampling_configs['timesteps'] = 20
extras.sampling_configs['t_start'] = 1.0

# Stage B Parameters
extras_b.sampling_configs['cfg'] = 1.1
extras_b.sampling_configs['shift'] = 1
extras_b.sampling_configs['timesteps'] = 10
extras_b.sampling_configs['t_start'] = 1.0

batch = {'captions': [caption] * batch_size}
conditions = core.get_conditions(batch, models, extras, is_eval=True, is_unconditional=False, eval_image_embeds=False)
unconditions = core.get_conditions(batch, models, extras, is_eval=True, is_unconditional=True, eval_image_embeds=False)
conditions_b = core_b.get_conditions(batch, models_b, extras_b, is_eval=True, is_unconditional=False)
unconditions_b = core_b.get_conditions(batch, models_b, extras_b, is_eval=True, is_unconditional=True)

with torch.no_grad(), torch.cuda.amp.autocast(dtype=torch.float32):
    # torch.manual_seed(42)

    sampling_c = extras.gdf.sample(
        models.generator, conditions, stage_c_latent_shape,
        unconditions, device=device, **extras.sampling_configs,
    for (sampled_c, _, _) in tqdm(sampling_c, total=extras.sampling_configs['timesteps']):
        sampled_c = sampled_c

    # preview_c = models.previewer(sampled_c).float()
    # show_images(preview_c)

    conditions_b['effnet'] = sampled_c
    unconditions_b['effnet'] = torch.zeros_like(sampled_c)

    sampling_b = extras_b.gdf.sample(
        models_b.generator, conditions_b, stage_b_latent_shape,
        unconditions_b, device=device, **extras_b.sampling_configs
    for (sampled_b, _, _) in tqdm(sampling_b, total=extras_b.sampling_configs['timesteps']):
        sampled_b = sampled_b
    sampled = models_b.stage_a.decode(sampled_b).float()



結果還是 triton 的問題,總之就是得把它裝起來就是了。
我不太確定如果直接照著他的 requirements.txt 裝,是不是就會順順的幫我裝完。但不想動 colab 的 cuda,我手上也沒有其他環境可以測,且最後的處理方式,感覺比較像是本來就得自己另外裝?

總之到 triton 的 git 上看一下安裝說明。注意不要用 pip install triton,因為他版本太舊,裝了後運行也是一樣報錯,我們需要他的最新版本 nightly release。

pip install -U --index-url triton-nightly

裝完後版本號是 triton-nightly 3.0.0.post20240716052845。
最後再執行一次 Text-to-Image 的儲存格,然後還是報錯。

這也是我重跑一次才出現的錯誤,測了一輪才想到,我一開始曾經用 Install from source 的方式載下來自己build,當時被題是要重新啟動工作階段。


下面是到目前為止的 pip list,供參考:

最後再跑一次 Text-to-Image 試試吧,照著做應該是會一切正常吧,輸出長這樣。


